Maintaining Frequent Itemsets over High-Speed Data Streams
نویسندگان
چکیده
We propose a false-negative approach to approximate the set of frequent itemsets (FIs) over a sliding window. Existing approximate algorithms use an error parameter, ǫ, to control the accuracy of the mining result. However, the use of ǫ leads to a dilemma. A smaller ǫ gives a more accurate mining result but higher computational complexity, while increasing ǫ degrades the mining accuracy. We address this dilemma by introducing a progressively increasing minimum support function. When an itemset is retained in the window longer, we require its minimum support to approach the minimum support of an FI. Thus, the number of potential FIs to be maintained is greatly reduced. Our experiments show that our algorithm not only attains highly accurate mining results, but also runs significantly faster and consumes less memory than do existing algorithms for mining FIs over a sliding window.
منابع مشابه
DELAY-CFIM: A Sliding Window Based Method on Mining Closed Frequent Itemsets over High-Speed Data Streams
Closed frequent itemset mining plays an essential role in data stream mining. It could be used in business decisions, basket analysis, etc. Most methods for mining closed frequent itemsets store the streamlined information in compact data structure when data is generated. Whenever a query is submitted, it outputs all closed frequent itemsets. However, the online processing of existing approache...
متن کاملCLAIM: An Efficient Method for Relaxed Frequent Closed Itemsets Mining over Stream Data
Recently, frequent itemsets mining over data streams attracted much attention. However, mining closed itemsets from data stream has not been well addressed. The main difficulty lies in its high complexity of maintenance aroused by the exact model definition of closed itemsets and the dynamic changing of data streams. In data stream scenario, it is sufficient to mining only approximated frequent...
متن کاملA false negative approach to mining frequent itemsets from high speed transactional data streams
Mining frequent itemsets from transactional data streams is challenging due to the nature of the exponential explosion of itemsets and the limit memory space required for mining frequent itemsets. Given a domain of I unique items, the possible number of itemsets can be up to 2 1. When the length of data streams approaches to a very large number N, the possibility of an itemset to be frequent be...
متن کاملMining Frequent Itemsets Over Arbitrary Time Intervals in Data Streams
Mining frequent itemsets over a stream of transactions presents di cult new challenges over traditional mining in static transaction databases. Stream transactions can only be looked at once and streams have a much richer frequent itemset structure due to their inherent temporal nature. We examine a novel data structure, an FP-stream, for maintaining information about itemset frequency historie...
متن کاملAn Efficient Algorithm for Maintaining Frequent Closed Itemsets over Data Stream
Data mining refers to the process of revealing unknown and potentially useful information from a large database. Frequent itemsets mining is one of the foundational problems in data mining, which is to discover the set of products that purchased frequently together by customers from a transaction database. However, there may be a large number of patterns generated from database, and many of the...
متن کاملMining Recent Frequent Itemsets in Sliding Windows over Data Streams
This paper considers the problem of mining recent frequent itemsets over data streams. As the data grows without limit at a rapid rate, it is hard to track the new changes of frequent itemsets over data streams. We propose an efficient one-pass algorithm in sliding windows over data streams with an error bound guarantee. This algorithm does not need to refer to obsolete transactions when 316 C....
متن کامل